Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Classification of symbolic sequences with multi-order Markov model
CHENG Lingfang, GUO Gongde, CHEN Lifei
Journal of Computer Applications    2017, 37 (7): 1977-1982.   DOI: 10.11772/j.issn.1001-9081.2017.07.1977
Abstract565)      PDF (956KB)(367)       Save
To solve the problem that the existing methods based on the fixed-order Markov models cannot make full use of the structural features involved in the subsequences of different orders, a new Bayesian method based on the multi-order Markov model was proposed for symbolic sequences classification. First, a Conditional Probability Distribution (CPD) model was built based on the multi-order Markov model. Second, a suffix tree for n-order subsequences with efficient suffix-tables and its efficient construction algorithm were proposed, where the algorithm could be used to learn the multi-order CPD models by scanning once the sequence set. A Bayesian classifier was finally proposed for the classification task. The training algorithm was designed to learn the order-weights for the models of different orders based on the Maximum Likelihood (ML) method, while the classification algorithm was defined to carry out the Bayesian prediction using the weighted conditional probabilities of each order. A series of experiments were conducted on real-world sequence sets from three domains and the results demonstrate that the new classifier is insensitive to the predefined order change of the model. Compared with the existing methods such as the support vector machine using the fixed-order model, the proposed method can achieve more than 40% improvement on both gene sequences and speech sequences in terms of classification accuracy, yielding reference values for the optimal order of a Markov model on symbolic sequences.
Reference | Related Articles | Metrics
Soft subspace clustering algorithm for imbalanced data
CHENG Lingfang, YANG Tianpeng, CHEN Lifei
Journal of Computer Applications    2017, 37 (10): 2952-2957.   DOI: 10.11772/j.issn.1001-9081.2017.10.2952
Abstract521)      PDF (935KB)(672)       Save
Aiming at the problem that the current K-means-type soft-subspace algorithms cannot effectively cluster imbalanced data due to uniform effect, a new partition-based algorithm was proposed for soft subspace clustering on imbalanced data. First, a bi-weighting method was proposed, where each attribute was assigned a feature-weight and each cluster was assigned a cluster-weight to measure its importance for clustering. Second, in order to make a trade-off between attributes with different types or those categorical attributes having various numbers of categories, a new distance measurement was then proposed for mixed-type data. Third, an objective function was defined for the subspace clustering algorithm on imbalanced data based on the bi-weighting method, and the expressions for optimizing both the cluster-weights and feature-weights were derived. A series of experiments were conducted on some real-world data sets and the results demonstrated that the bi-weighting method used in the new algorithm can learn more accurate soft-subspace for the clusters hidden in the imbalanced data. Compared with the existing K-means-type soft-subspace clustering algorithms, the proposed algorithm yields higher clustering accuracy on imbalanced data, achieving about 50% improvements on the bioinformatic data used in the experiments.
Reference | Related Articles | Metrics